智能论文笔记

COVID-19 Disease Progression Prediction via Audio Signals: A Longitudinal Study

Ting Dang , Jing Han , Tong Xia , Dimitris Spathis , Erika Bondareva , Chloë Brown , Jagmohan Chauhan , Andreas Grammenos , Apinan Hasthanasombat , Andres Floto

分类：机器学习

2022-01-04

最近的工作表明，在Covid-19筛选中使用音频数据的可能性。然而，对监测疾病进展进行了很少的探索，特别是通过音频在Covid-19中恢复。跟踪疾病进展特征和复苏模式可能导致巨大的见解和更及时的治疗或治疗调整，以及在医疗保健系统中更好的资源管理。本研究的主要目的是利用顺序深度学习技术探讨Covid-19监测的纵向音频动力学的潜力，专注于疾病进展预测，特别是恢复趋势预测。我们分析了5天至385天的212个个体中众包呼吸系统数据，以及其自我报告的Covid-19测试结果。我们首先探讨捕获音频生物标志物的纵向动态的好处，用于Covid-19检测。强化性能，产生0.79的AUC-ROC，灵敏度为0.75，特异性为0.70，与不利用纵向动态的方法相比，该方法的有效性。我们进一步检查了预测的疾病进展轨迹，其显示出高一致性与纵向试验结果，测试队列中的0.76中的相关性，测试队列的子集中为0.86，其中12名参与者报告疾病恢复。我们的研究结果表明，通过纵向音频数据监测Covid-19进展在追踪个人疾病进展和恢复方面具有巨大潜力。

translated by 谷歌翻译

Exploring System Performance of Continual Learning for Mobile and Embedded Sensing Applications

Young D. Kwon , Jagmohan Chauhan , Abhishek Kumar , Pan Hui , Cecilia Mascolo

分类：机器学习 | 人工智能

2021-10-25

持续的学习方法通过试图解决灾难性遗忘来帮助深度神经网络模型适应和逐步学习。但是，无论这些现有方法是否传统上应用于基于图像的任务，都具有与移动或嵌入式传感系统生成的顺序时间序列数据相同的疗效仍然是一个未解决的问题。为了解决这一空白，我们进行了第一项全面的经验研究，该研究量化了三个主要的持续学习方案的性能（即，在三个移动和嵌入式感应应用程序中的六个数据集中的三个主要的持续学习方案（即正规化，重播和重播）的性能。不同的学习复杂性。更具体地说，我们在Edge设备上实现了端到端连续学习框架。然后，我们研究了不同持续学习方法的性能，存储，计算成本和记忆足迹之间的普遍性，权衡。我们的发现表明，以示例性计划（例如ICARL）重播，即使在复杂的场景中，甚至在复杂的场景中都具有最佳的性能权衡，以牺牲一些存储空间（少数MB）来训练示例（1％至5％）。我们还首次证明，以有限的记忆预算进行连续学习，可行和实用。特别是，两种类型的移动设备和嵌入式设备的延迟表明，可以接受递增的学习时间（几秒钟-4分钟）和培训时间（1-75分钟），可以接受，因为嵌入式嵌入式时可能会在设备上进行培训设备正在充电，从而确保完整的数据隐私。最后，我们为希望将不断学习范式应用于移动传感任务的从业者提供了一些准则。

translated by 谷歌翻译

Importance of Synthesizing High-quality Data for Text-to-SQL Parsing

Yiyun Zhao , Jiarong Jiang , Yiqun Hu , Wuwei Lan , Henry Zhu , Anuj Chauhan , Alexander Li , Lin Pan , Jun Wang , Chung-Wei Hang

分类：自然语言处理

2022-12-17

Recently, there has been increasing interest in synthesizing data to improve downstream text-to-SQL tasks. In this paper, we first examined the existing synthesized datasets and discovered that state-of-the-art text-to-SQL algorithms did not further improve on popular benchmarks when trained with augmented synthetic data. We observed two shortcomings: illogical synthetic SQL queries from independent column sampling and arbitrary table joins. To address these issues, we propose a novel synthesis framework that incorporates key relationships from schema, imposes strong typing, and conducts schema-distance-weighted column sampling. We also adopt an intermediate representation (IR) for the SQL-to-text task to further improve the quality of the generated natural language questions. When existing powerful semantic parsers are pre-finetuned on our high-quality synthesized data, our experiments show that these models have significant accuracy boosts on popular benchmarks, including new state-of-the-art performance on Spider.

translated by 谷歌翻译

Interactive Concept Bottleneck Models

Kushal Chauhan , Rishabh Tiwari , Jan Freyberg , Pradeep Shenoy , Krishnamurthy Dvijotham

分类：机器学习 | 人工智能

2022-12-14

Concept bottleneck models (CBMs) (Koh et al. 2020) are interpretable neural networks that first predict labels for human-interpretable concepts relevant to the prediction task, and then predict the final label based on the concept label predictions.We extend CBMs to interactive prediction settings where the model can query a human collaborator for the label to some concepts. We develop an interaction policy that, at prediction time, chooses which concepts to request a label for so as to maximally improve the final prediction. We demonstrate thata simple policy combining concept prediction uncertainty and influence of the concept on the final prediction achieves strong performance and outperforms a static approach proposed in Koh et al. (2020) as well as active feature acquisition methods proposed in the literature. We show that the interactiveCBM can achieve accuracy gains of 5-10% with only 5 interactions over competitive baselines on the Caltech-UCSDBirds, CheXpert and OAI datasets.

translated by 谷歌翻译

PowRL: A Reinforcement Learning Framework for Robust Management of Power Networks

Anandsingh Chauhan , Mayank Baranwal , Ansuma Basumatary

分类：机器学习 | 人工智能 | (统计)机器学习

2022-12-05

Power grids, across the world, play an important societal and economical role by providing uninterrupted, reliable and transient-free power to several industries, businesses and household consumers. With the advent of renewable power resources and EVs resulting into uncertain generation and highly dynamic load demands, it has become ever so important to ensure robust operation of power networks through suitable management of transient stability issues and localize the events of blackouts. In the light of ever increasing stress on the modern grid infrastructure and the grid operators, this paper presents a reinforcement learning (RL) framework, PowRL, to mitigate the effects of unexpected network events, as well as reliably maintain electricity everywhere on the network at all times. The PowRL leverages a novel heuristic for overload management, along with the RL-guided decision making on optimal topology selection to ensure that the grid is operated safely and reliably (with no overloads). PowRL is benchmarked on a variety of competition datasets hosted by the L2RPN (Learning to Run a Power Network). Even with its reduced action space, PowRL tops the leaderboard in the L2RPN NeurIPS 2020 challenge (Robustness track) at an aggregate level, while also being the top performing agent in the L2RPN WCCI 2020 challenge. Moreover, detailed analysis depicts state-of-the-art performances by the PowRL agent in some of the test scenarios.

translated by 谷歌翻译

Adversarial De-confounding in Individualised Treatment Effects Estimation

Vinod Kumar Chauhan , Soheila Molaei , Marzia Hoque Tania , Anshul Thakur , Tingting Zhu , David Clifton

分类：机器学习 | 人工智能

2022-10-19

Observational studies have recently received significant attention from the machine learning community due to the increasingly available non-experimental observational data and the limitations of the experimental studies, such as considerable cost, impracticality, small and less representative sample sizes, etc. In observational studies, de-confounding is a fundamental problem of individualised treatment effects (ITE) estimation. This paper proposes disentangled representations with adversarial training to selectively balance the confounders in the binary treatment setting for the ITE estimation. The adversarial training of treatment policy selectively encourages treatment-agnostic balanced representations for the confounders and helps to estimate the ITE in the observational studies via counterfactual inference. Empirical results on synthetic and real-world datasets, with varying degrees of confounding, prove that our proposed approach improves the state-of-the-art methods in achieving lower error in the ITE estimation.

translated by 谷歌翻译

Robust Causality and False Attribution in Data-Driven Earth Science Discoveries

Elizabeth Eldhose , Tejasvi Chauhan , Vikram Chandel , Subimal Ghosh , Auroop R. Ganguly

分类： (统计)机器学习

2022-09-26

因果和归因研究对于地球科学发现至关重要，对于为气候，生态和水政策提供信息至关重要。但是，当前的方法需要与科学和利益相关者挑战的复杂性以及数据可用性以及数据驱动方法的充分性相结合。除非通过物理学进行仔细的通知，否则它们会冒着将相关性与因果关系相关或因估计不准确而淹没的风险。鉴于自然实验，对照试验，干预措施和反事实检查通常是不切实际的，因此已经开发了信息理论方法，并在地球科学中不断完善。在这里，我们表明，基于转移熵的因果图最近在具有备受瞩目的发现的地球科学中变得流行，即使增强具有统计学意义，也可能是虚假的。我们开发了一种基于子样本的合奏方法，用于鲁棒性因果分析。模拟数据以及气候和生态水文中的观察表明，这种方法的鲁棒性和一致性。

translated by 谷歌翻译

Shaken, and Stirred: Long-Range Dependencies Enable Robust Outlier Detection with PixelCNN++

Barath Mohan Umapathi , Kushal Chauhan , Pradeep Shenoy , Devarajan Sridharan

分类：机器学习

2022-08-29

可靠的异常检测对于深度学习模型的现实应用至关重要。深层生成模型产生的可能性虽然进行了广泛的研究，但仍被认为是对异常检测的不切实际的。一方面，深层生成模型的可能性很容易被低级输入统计数据偏差。其次，许多用于纠正这些偏见的解决方案在计算上是昂贵的，或者对复杂的天然数据集的推广不佳。在这里，我们使用最先进的深度自回归模型探索离群值检测：PixelCNN ++。我们表明，PixelCNN ++的偏见主要来自基于局部依赖性的预测。我们提出了两个我们称为“震动”和“搅拌”的徒转化家族，它们可以改善低水平的偏见并隔离长期依赖性对PixelCNN ++可能性的贡献。这些转换在计算上是便宜的，并且在评估时很容易应用。我们使用五个灰度和六个自然图像数据集对我们的方法进行了广泛的评估，并表明它们达到或超过了最新的离群检测性能。总而言之，轻巧的补救措施足以在具有深层生成模型的图像上实现强大的离群检测。

translated by 谷歌翻译

Friendliness Of Stack Overflow Towards Newbies

Aneesh Tickoo , Shweta Chauhan , Gagan Raj Gupta

分类：机器学习

2022-08-21

在当今的现代数字世界中，我们有许多在线问答平台，例如Stack Exchange，Quora和GFG，它们是人们交流和互相帮助的媒介。在本文中，我们分析了堆栈溢出在帮助新手进行编程方面的有效性。该平台上的每个用户都会经历旅程。在最初的12个月中，我们认为它们是新手。在12个月后，他们属于以下类别之一：经验丰富，潜伏或好奇。每个问题都有分配给它的标签，我们观察到具有某些特定标签的问题的响应时间更快，表明该领域的活跃社区比其他领域的社区。该平台截至2013年开始稳定增长，之后它开始下降，但是最近在2020年大流行期间，我们可以在平台上看到恢复活力的活动。

translated by 谷歌翻译

RadTex: Learning Efficient Radiograph Representations from Text Reports

Keegan Quigley , Miriam Cha , Ruizhi Liao , Geeticka Chauhan , Steven Horng , Seth Berkowitz , Polina Golland

分类：计算机视觉

2022-08-05

使用深度学习对胸部射线照相的自动分析具有巨大的潜力，可以增强患者疾病的临床诊断。但是，深度学习模型通常需要大量的带注释的数据来实现高性能 - 通常是医疗领域适应的障碍。在本文中，我们构建了一个利用放射学报告来通过有限的标记数据（少于1000个示例）来改善医学图像分类性能，以提高医学图像分类性能。具体而言，我们检查了捕获图像预告片，以学习以更少的例子进行训练的高质量医学图像表示。在对卷积编码器和变压器解码器进行联合预测之后，我们将学习的编码器转移到各种分类任务中。平均9多种病理学，我们发现我们的模型在标记培训数据受到限制时，比参见和内域监督的预处理的分类性能更高。

translated by 谷歌翻译